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Abstract 

To determine the accuracy with which Medicare claims data measure disease-free survival in 
elderly Medicare beneficiaries with cancer, we performed a criterion validation study. We merged 
gold-standard clinical trial data of 45 elderly patients with node-positive breast cancer who were 
treated on the Cancer and Leukemia Group B (CAL-GB) adjuvant breast trial 9344 with Centers 
for Medicare and Medicaid Services (CMS) data files and compared the results of a CMS-based 
algorithm with the CALGB disease-free survival information to determine sensitivity and 
specificity. For 5-year disease-free survival, the sensitivity of the CMS-based algorithm was 100% 
(95% confidence interval [CI] = 81% to 100%), the specificity was 97% (95% CI = 83% to 
100%), and the area under the receiver operator curve was 97% (95% CI = 90% to 100%). For 2- 
year disease-free survival, the test characteristics were less favorable: sensitivity was 83% (95% 
CI = 36% to 100%), specificity was 95% (95% CI = 83% to 100%), and area under the receiver 
operator curve was 84% (95% CI = 66% to 100%). 



The elderly are numerically under-represented (1-3) and possibly physiologically 
misrepresented (4) in clinical trials of cancer chemotherapy such that trial participants are 
potentially younger and healthier than the average elderly cancer patient. The general 
population of elderly Americans therefore may not experience the same benefits and 
toxicities of chemotherapy as trial participants. Nevertheless, clinicians need information 
about the expected benefits and toxicities of chemotherapy in the elderly. A solution to this 
problem is to supplement clinical trial results with observational results. Prior research has 
shown that Centers for Medicare and Medicaid Services (CMS) data can be used to 
accurately measure chemotherapy use (5,6), but the extent to which it can be used to 
measure outcomes that are traditionally reported in clinical trials, including conventional 
survival and toxicity endpoints, is unknown. 

Disease-free survival is one of the most common survival metrics in adjuvant chemotherapy 
trials and is defined as the time from enrollment to the first of two events: cancer relapse or 
death from any cause. Disease-free survival incorporates the morbidity associated with 
recurrent disease that overall survival does not and is thus most relevant to those cancers 
with long periods between recurrence and death (e.g., local or regionally advanced cancers 
of the breast, prostate, colon, and rectum). Indeed, the Food and Drug Administration has 
accepted disease-free survival as a regulatory endpoint that demonstrates clinical benefit for 
adjuvant therapy. 

We are unaware of previous attempts to measure disease-free survival through Medicare 
claims. This absence is not due to an inability to measure survival — the accuracy of CMS 
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vital status information is well established (7). Rather, it is the result of ambiguity regarding 
the ability of claims to capture cancer recurrence. As a preliminary study to determine the 
accuracy with which Medicare claims data capture disease-free survival in elderly breast 
cancer patients, we developed an algorithm for measuring both disease recurrence and death 
using Medicare data files and compared the results with an external gold-standard measure 
of disease-free survival, Cancer and Leukemia Group B (CALGB) clinical trial data. 

The model cohort included all patients aged > 65 years (n = 52) who consented, enrolled, 
and were treated on CALGB trial 9344, "Doxorubicin dose escalation, with or without 
Taxol, as part of the CA adjuvant regimen for node positive breast cancer" between January 
1, 1995, and December 31, 1997. These 52 patients represented only 5.5% (52 of 944) of the 
full enrollment during this period. Despite what seems like a small percentage of elderly 
enrollees, the trial was chosen because the absolute value of enrolled elderly was among the 
largest of all phase 3 trials during that period. 

We then carefully linked the patients' CALGB clinical trial data (e.g., disease-free survival 
information) to their CMS Medicare claims files (i.e., denominator, Carrier, Outpatient, and 
MedPAR files) from enrollment through December 31, 2000 (8). We were able to match 51 
(98%) of the 52 patients to Medicare files, a rate consistent with previous studies (9). Six 
patients were excluded because they were enrolled in health maintenance organizations and 
their claims were not processed through CMS (n = 3) or because they were not enrolled in 
Medicare part B (n = 3). Thus, the final analytic sample contained 45 patients. 

This study was approved by the University of Chicago and Massachusetts General Hospital 
institutional review boards and conducted in compliance with their regulations. Data quality 
was ensured by careful review of data by CALGB Statistical Center staff and by the study 
chairperson. Statistical analyses were approved and confirmed by CALGB statisticians. All 
analyses were two-sided and performed using STATA version 8 SE; P<.05 was considered 
statistically significant. 

We developed and ultimately refined a clinically intuitive algorithm to measure disease-free 
survival that required screening patients' Medicare claims from the calendar date of their 
enrollment on CALGB 9344 forward through their last Medicare claim up to December 31, 
2000 (the last day of available claims files), for evidence of cancer relapse or death. In 
clinical medicine, cancers are described according to primary anatomic sites (i.e., where the 
tumor originates) and secondary anatomic sites (i.e., location of tumor spread). The primary 
anatomic site is well documented in Surveillance, Epidemiology, and End Results (SEER) 
(10) and may be reasonably well documented in CMS files (11). However, whether (or 
when) the tumor has spread to anatomically distinct sites is not reported or measured by 
SEER, and it is not known whether such events are captured reliably in CMS claims. 
Because CMS claims use International Classification of Diseases 9th Revision-Clinical 
Modification (ICD-9-CM) diagnostic codes that do have distinct values for primary and 
secondary cancer sites, it is at least theoretically possible that CMS claims distinguish 
primary anatomic malignancy sites from secondary sites of primary cancer spread. 
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In CMS files, we identified cancer relapse by evaluating MedPAR, Carrier, and Outpatient 
files for ICD-9 diagnostic codes indicating secondary sites of cancer (Table 1) and then 
dated the relapse by "claim through date." From the universe of ICD-9-CM secondary sites 
of cancer codes, we omitted the code 198.2 indicating "secondary malignant neoplasm of the 
breast" to avoid erroneous overcoding of patients' primary breast cancer as "relapse" and 
ultimately omitted the code 198.89 " secondary malignant neoplasm of unspecified site" 
because of its high false-positive rate. Date of death was obtained from CMS denominator 
files. The first of the two possible events (i.e., relapse or death) was chosen to represent 
disease-free survival according to CMS claims. Individuals without either event were 
censored at the date of their last CMS claim with a maximum date of December 31, 2000. 

We compared CMS disease-free survival information with the gold standard of CALGB 
disease-free survival information (censored at December 31, 2000) to calculate sensitivity, 
specificity, and area under the receiver operator curve of the disease-free survival indicator 
during the entire follow-up period and at traditional trial-censoring points (i.e., 2 years and 5 
years). We defined sensitivity as the proportion of the patients known (according to CALGB 
data) to have either died or relapsed who were correctly identified through CMS claims as 
such. We defined specificity as the proportion of the patients known (according to CALGB 
data) to be both alive and without disease relapse who were correctly identified through 
CMS claims as such. The global performance of the measures was summarized by the area 
under the receiver operator curve (graph of sensitivity versus 1 - specificity); the greater the 
area under the receiver operator curve (maximum 1 .00), the better the discriminatory 
accuracy of the measure. 

The sample was exclusively female, the mean age was 69.6 years (standard deviation + 4.2 
years), and 82% were white, 11% black, and 7% Hispanic. During the entire follow-up 
period (i.e., a maximum of 2162 days in the CALGB data source and 2142 days in the CMS 
data source), 15 (33%) of the 45 patients had relapsed or had died according to CMS claims. 
By comparison, the gold-standard source of survival information noted, as a first event, 
relapse or death in 14 (31%) of 45 (P = .84) (Supplementary Figure, available at http:// 
jncicancerspectrum.oxfordjournals.org/jnci/content/vol98/issuel8). The 5-year disease-free 
survival sensitivity (100%, 95% confidence interval [CI] = 83% to 100%), specificity (97%, 
95% CI = 87% to 100%), and area under the receiver operator curve (97%, 95% CI = 90% 
to 100%) were favorable, each being more favorable than 2-year disease-free survival 
(sensitivity = 83%, 95% CI = 36% to 100%; specificity = 95%, 95% CI = 83% to 100%; and 
area under the receiver operator curve = 84%, 95% CI = 66% to 100%) (Table 2). 

We compared results of the CMS algorithm to the gold-standard data source with respect to 
the three components of disease-free survival (i.e., relapse, death, and censoring) at 5 years. 
According to the gold-standard data source, 86% (12 of 14) of the disease-free survival 
failures were the result of relapsed disease and 14% (2 of 14) were the result of death (Table 
3). The ICD-9-CM codes for "secondary malignant neoplasm" applied to CMS data 
correctly categorized 1 1 of these 12 relapsed patients in CALGB data (sensitivity = 92%, 
95% CI = 66% to 100%) and correctly categorized 31 of the 33 nonrelapsed patients 
(specificity = 94%, 95% CI = 82% to 99%). 
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We measured the patient-level difference in the gold-standard and CMS algorithm time 
variables. The median difference in CALGB compared with CMS time was 40 days, and the 
inter-quartile range was 12-86 days. The mean difference was 99 days (95% CI = 4.5 to 193 
days). 

This preliminary validation study of Medicare claims shows that for elderly Medicare 
beneficiaries with histories of lymph node-positive breast cancer who were treated on a 
randomized phase 3 CALGB adjuvant chemotherapy trial, contemporaneous Medicare 
claims files reflect subsequent 5-year disease-free survival with a high degree of accuracy. 
The algorithm appears to also distinguish relapse from death within the domain of disease- 
free survival, which suggests that the secondary anatomic site codes may capture cancer 
recurrence. Taken with results of prior work documenting the ability of CMS claims to 
accurately measure certain anticancer therapies (6,12,13), these results suggest that CMS 
data capture some key elements of the clinical trial paradigm, and thus existing CMS-based 
data sources (e.g., SEER-Medicare) may be leveraged to yield clinical information regarding 
the effectiveness of adjuvant cancer treatments in the general population of elderly with 
solid tumors. Future research will focus on confirming these findings in a larger and more 
diverse patient sample. 
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Table 1 

Algorithm for measuring disease-free survival in Centers for Medicare and Medicaid Services (CMS) claims 
files 



Code type 

ICD-9 Dx* 



Value 


Description 




197.0 


Secondary malij 


mant neoplasm of the lung 


197.1 


Secondary main 


mant neoplasm of the mediastinum 


197.2 


Secondary main 


mant neoplasm of the pleura 


197.3 


Secondary main 


mant neoplasm of other respiratory organs 


197.4 


Secondary malij 


mant neoplasm of the small intestine, including duodenum 


197.5 


Secondary malij 


mant neoplasm of the large intestine and rectum 


197.6 


Secondary malij 


mant neoplasm of the retroperitoneum and peritoneum 


197.7 


Secondary main 


mant neoplasm of the liver 


197.8 


Secondary main 


mant neoplasm of the other digestive organs and spleen 


198.0 


Secondary main 


mant neoplasm of the kidney 


198.1 


Secondary malij 


mant neoplasm of other urinary organs 


198.2 


Secondary malij 


mant neoplasm of the skin 


198.3 


Secondary malij 


mant neoplasm of the brain and spinal cord 


198.4 


Secondary main 


mant neoplasm of the other parts of the nervous system 


198.5 


Secondary main 


mant neoplasm of the bone and bone marrow 


198.6 


Secondary main 


mant neoplasm of the ovary 


198.7 


Secondary malij 


mant neoplasm of the adrenal gland 


198.8 


Secondary malij 


mant neoplasm of other sites 


198.82 


Secondary malij 


mant neoplasm of the genital organs 



Death indicator^ Date of death Field contains date of death of Medicare beneficiary 

* 

ICD-9 — International Classification of Diseases 9th Revision codes applied to CMS ambulatory and hospital files (i.e., Carrier, Outpatient, 
MedPAR). 

^CMS denominator file. 
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Table 2 

Test characteristics of the Centers for Medicare and Medicaid Services (CMS) disease-free survival algorithm 



Censoring Period 


Sensitivity (95% CI) 


Specificity (95% CI) 


Area under ROC (95% CI) 


2y 


83% (36% to 100%) 


95% (83% to 100%) 


84% (66% to 100%) 


5y 


100% (81% to 100%) 


97% (83% to 100%) 


97% (90% to 100%) 



* 

The test characteristics were estimated by comparing CMS files with gold- standard clinical trial data pertaining to 45 elderly women with lymph 
node-positive breast cancer treated on the adjuvant chemotherapy trial CALGB 9344. CI - confidence interval; ROC - receiver operator curve. 
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Table 3 

Five-year disease-free survival (disease-free survival) censoring proportions for Cancer and Leukemia Group 
B (CALGB) versus Centers for Medicare and Medicaid Services (CMS) data 



Censoring category 


CMS data 


CALGB gold-standard data 


disease-free survival = 


1 




Relapsed 


13 


12 


Dead 


2 


2 


disease-free survival = 


0 




Censored 


30 


31 


Total 


45 


45 
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